530 research outputs found

    Multi Agent Reward Analysis for Learning in Noisy Domains

    Get PDF
    In many multi agent learning problems, it is difficult to determine, a priori, the agent reward structure that will lead to good performance. This problem is particularly pronounced in continuous, noisy domains ill-suited to simple table backup schemes commonly used in TD(lambda)/Q-learning. In this paper, we present a new reward evaluation method that allows the tradeoff between coordination among the agents and the difficulty of the learning problem each agent faces to be visualized. This method is independent of the learning algorithm and is only a function of the problem domain and the agents reward structure. We then use this reward efficiency visualization method to determine an effective reward without performing extensive simulations. We test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and where their actions are noisy (e.g., the agents movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting a good reward. Most importantly it allows one to quickly create and verify rewards tailored to the observational limitations of the domain

    Quicker Q-Learning in Multi-Agent Systems

    Get PDF
    Multi-agent learning in Markov Decisions Problems is challenging because of the presence ot two credit assignment problems: 1) How to credit an action taken at time step t for rewards received at t' greater than t; and 2) How to credit an action taken by agent i considering the system reward is a function of the actions of all the agents. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning OK TD(lambda) The second credit assi,onment problem is typically addressed either by hand-crafting reward functions that assign proper credit to an agent, or by making certain independence assumptions about an agent's state-space and reward function. To address both credit assignment problems simultaneously, we propose the Q Updates with Immediate Counterfactual Rewards-learning (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. Instead of assuming that an agent s value function can be made independent of other agents, this method suppresses the impact of other agents using counterfactual rewards. Results on multi-agent grid-world problems over multiple topologies show that QUICR-learning can achieve up to thirty fold improvements in performance over both conventional and local Q-learning in the largest tested systems

    QUICR-learning for Multi-Agent Coordination

    Get PDF
    Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit must be assigned for an action taken at time step t that results in a reward at time step t > t. Second, credit must be assigned for the contribution of agent i to the overall system performance. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning. The second credit assignment problem is typically addressed by creating custom reward functions. To address both credit assignment problems simultaneously, we propose the "Q Updates with Immediate Counterfactual Rewards-learning" (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. QUICR-learning is based on previous work on single-time-step counterfactual rewards described by the collectives framework. Results on a traffic congestion problem shows that QUICR-learning is significantly better than a Q-learner using collectives-based (single-time-step counterfactual) rewards. In addition QUICR-learning provides significant gains over conventional and local Q-learning. Additional results on a multi-agent grid-world problem show that the improvements due to QUICR-learning are not domain specific and can provide up to a ten fold increase in performance over existing methods

    A Phenomenology of Religion?

    Get PDF
    This research explores the possibility of a phenomenology of religion that is ontological, founded on Martin Heidegger’s philosophical thought. The research attempts to utilise Heidegger’s formulation of phenomenology as ontology while also engaging in a critical relation with his path of thinking; as a barrier to the phenomenological interpretation of the meaning of Religion. This research formulates Religion as an ontological problem wherein the primary question becomes: how are humans, in our being, able to be religious and thus also able to understand the meaning of ‘religion’ or something like ‘religion’? This study focuses on the problem of foundation; of whether it is possible to provide an adequate foundation for the study of religion(s) via the notion ‘Religion’. Further, this study also aims to explore the problem of methodological foundation; of how preconceptions of the meaning of Religion predetermine how religion(s) and religious phenomena are studied. Finally, this research moves toward the possibility of founding a regional ontological basis for the study of religion(s) insofar as the research explores the ontological ground of Religion as a phenomenon. Due to the exploratory and methodological/foundational emphasis of the research, the thesis is almost entirely preliminary. Herein, the research focuses on three main issues: how the notion of Religion is preconceived, how Heidegger’s phenomenology can be tailored to the phenomenon of Religion, and how philosophical thought (in this case, Pre-Socratic philosophy) discloses indications of the meaning of Religion. Pre-Socratic thought is then utilised as a foundation for a preliminary interpretation of how Religion belongs-to humans in our being. This research provides two interrelated theses: the provision of an interpretation of Religion as an existential phenomenon, and an interpretation of Religion in its ground of being-human. With regard to the former, I argue that Religion signifies a potential relation with the ‘originary ground’ of life as meaningful. Accordingly, the second interpretation discloses the meaning of Religion as grounded in being-human; that for humans in our being, the meaning of life is an intrinsic question/dilemma for us. This being-characteristic, I argue, can be called belief

    Learning sequences of actions in collectives of autonomous agents

    Get PDF

    Efficient Evaluation Functions for Evolving Coordination

    Full text link

    The impact of agent definitions and interactions on multiagent learning for coordination

    Get PDF

    Collective Intelligence for Control of Distributed Dynamical Systems

    Full text link
    We consider the El Farol bar problem, also known as the minority game (W. B. Arthur, ``The American Economic Review'', 84(2): 406--411 (1994), D. Challet and Y.C. Zhang, ``Physica A'', 256:514 (1998)). We view it as an instance of the general problem of how to configure the nodal elements of a distributed dynamical system so that they do not ``work at cross purposes'', in that their collective dynamics avoids frustration and thereby achieves a provided global goal. We summarize a mathematical theory for such configuration applicable when (as in the bar problem) the global goal can be expressed as minimizing a global energy function and the nodes can be expressed as minimizers of local free energy functions. We show that a system designed with that theory performs nearly optimally for the bar problem.Comment: 8 page

    Genome-wide DNA methylation analysis of transient neonatal diabetes type 1 patients with mutations in ZFP57

    Get PDF
    BackgroundTransient neonatal diabetes mellitus 1 (TNDM1) is a rare imprinting disorder characterized by intrautering growth retardation and diabetes mellitus usually presenting within the first six weeks of life and resolves by the age of 18 months. However, patients have an increased risk of developing diabetes mellitus type 2 later in life. Transient neonatal diabetes mellitus 1 is caused by overexpression of the maternally imprinted genes PLAGL1 and HYMAI on chromosome 6q24. One of the mechanisms leading to overexpression of the locus is hypomethylation of the maternal allele of PLAGL1 and HYMAI. A subset of patients with maternal hypomethylation at PLAGL1 have hypomethylation at additional imprinted loci throughout the genome, including GRB10, ZIM2 (PEG3), MEST (PEG1), KCNQ1OT1 and NESPAS (GNAS-AS1). About half of the TNDM1 patients carry mutations in ZFP57, a transcription factor involved in establishment and maintenance of methylation of imprinted loci. Our objective was to investigate whether additional regions are aberrantly methylated in ZFP57 mutation carriers.MethodsGenome-wide DNA methylation analysis was performed on four individuals with homozygous or compound heterozygous ZFP57 mutations, three relatives with heterozygous ZFP57 mutations and five controls. Methylation status of selected regions showing aberrant methylation in the patients was verified using bisulfite-sequencing.ResultsWe found large variability among the patients concerning the number and identity of the differentially methylated regions, but more than 60 regions were aberrantly methylated in two or more patients and a novel region within PPP1R13L was found to be hypomethylated in all the patients. The hypomethylated regions in common between the patients are enriched for the ZFP57 DNA binding motif.ConclusionsWe have expanded the epimutational spectrum of TNDM1 associated with ZFP57 mutations and found one novel region within PPP1R13L which is hypomethylated in all TNDM1 patients included in this study. Functional studies of the locus might provide further insight into the etiology of the disease.<br/

    To Adapt or Not to Adapt – Consequences of Adapting Driver and Traffic Light Agents

    Get PDF
    One way to cope with the increasing traffic demand is to integrate standard solutions with more intelligent control measures. However, the result of possible interferences between intelligent control or information provision tools and other components of the overall traffic system is not easily predictable. This paper discusses the effects of integrating co-adaptive decision-making regarding route choices (by drivers) and control measures (by traffic lights). The motivation behind this is that optimization of traffic light control is starting to be integrated with navigation support for drivers. We use microscopic, agent-based modelling and simulation, in opposition to the classical network analysis, as this work focuses on the effect of local adaptation. In a scenario that exhibits features comparable to real-world networks, we evaluate different types of adaptation by drivers and by traffic lights, based on local perceptions. In order to compare the performance, we have also used a global level optimization method based on genetic algorithms
    • …
    corecore